DOCUMENT RESUME 



TM 840 620 

Poggio, John P. 

Practical Considerations When Setting Test Standards:. 
A Look at the Process Used in Kansas. 
Apr 84 

13p.; Paper presented at the Annual Meeting o£ the 
American Educational Research Association (68th, New 
Orleans, LA, April ?.3~27, 1984). 
Speeches/Conference Papers (150) -~ Reports - 
Descriptive (141) — Tests/Evaluation Instruments 
(160) 

MF01/?C01 Plus Postage. 

♦Criterion Referenced Tests; *Cutting Scores; 
Elementary Secondary Education; ^Minimum Competency 
Testing; *Scoring; *State Legislation; State 
Programs? State Standards; Testing Programs 
*Ango££ Methods; Ebel Method; Kansas; ^Standard 
Setting 



Kansas law requires setting passing snores for the 
reading and mathematics competency test for 2nd, 4th, 6th, 8th,^ and 
11th grade student'^, administered annually since April of 1980. New 
objective-referenced tests are prepared each year. Both judgmental 
(Angoff, Ebel, and Nedelsky) methods and empirical (contrasting 
groups and borderline) methods have Deen used to set test standards. 
While no one method appears to identify true cut scores apd cut score 
comparison over methods is consistent with other research, only the 
Angoff and Ebel methods are currently being used. While problems were 
found with all methods, empirical and Nedelsky methods -were more 
confusing to participants and yielded lower standards. A survey 
approach has replaced ^panel judgment for data collection. It is more 
efficient, permits a broader input base, and produces more 
psychometrically favorable standards. A 26-member State Advising 
Committee inperpolates for the data gathered to set standards rather 
than using the mathematics prescribed by the methods. The process, 
while objective to a point, remains largely value-laden. Standard 
data for each 1982 test are given. Sample survey forms and rating 
sheets for the Angoff and Ebel ^methods are appended. (BS) 



ED 249 267 

AUTHOR 
TITLE 

PUB DATE 
NOTE 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 
ABSTRACT 




***************************************************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. ^ * 
************************************************************************ 



ERIC 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 





EDUCATIONAL HESOURCtS INFORMATION 



U.S. depahtment of education 

NATIONAL INSTITUTE OF EDUCATION 



CENTEH (EHICI 



TO THE EDUCATIONAL RESOURCES 
INFORMATIOM CENTER (ERIC)/' 



ryC Tfiii. (tocumor^l h.ib buun repjoduced as 
h;( rived fiom the pefSOn Ot o'cjani/aiion 

Minof c?>anyos havo buen mod^ to impfove 
ri'piodui tion quality 



Practical Considerations When Setting Test Standards: 
A Look at the FrocesjS Used in Kansas 
John P. Pogglo 
University of Kansas 



• Pnihtb of vu*>^ lu u|)iniOn& sta^od m this docii 
nil •lit tU) luit iiL'cessanly iL'piL'ttont official Nl£ 
p*)silK>»' ni ixjiicv 



Kansas Law requires competency testing of students at grades 2, 
4, 6, 8 and 11. Students are tested In the areas of reading and mathe- 
matics. Tu-stlng his bfcan carried out In April of 1980^ 1982 and 1983. 
In each year, new objective-referenced tests^iare prepared. While legis- 
lation does not call for promotion decisions or diploma awarding to be 
tied to a student's performance on the tests, legislation does require 
that a passing score be akt for each of the 10 tests. Public reports 
of building, district and state performance are oriented around the pro- 
portion of students, by area and grade level. Judged minimally competent. 
As such, matters surrounding how the passing score jwas arrived at receive 
considerable attention. This paper describes how standards are set in 
the State, and highlights issues with which we have had to contend. 

Methods Tried , ^'^ 

Over the years we have attempted setting the test standards using 
Judgmental (Ango£f»' Ebel and Nedelsky) methods and empirical (contrasting 
groups and boarder line) methods. We have used the judgmental methods 
in two formats. One format was to convene panels of judges and derive 
standards in the mode commonly cited and discussed. The secoid format 
has been to prepare a survey type questionnaire (see attachment) and send 
it to large samples of judges across the state for rating. 

In the sections which follow are discussed what we have learned over 
time about each of the methods. It should be noted throughout this 
presentation that (1) no one method has surfaced as the one to use that 



r->7~$.^ 



- 2 - 

?• ■ ' 

identifies the true cut-s^ore, and (2) comp2^|lsons of the cut-acores 

over methods is altogether consistent with other research represented 

in this area (Foggio and Glasnapp, 1381; 1982). 

Empirical Methods: Contrasting Groups and Border Line 

(1) f.hd methods are rather easily implementablciii 

(2) teachers report little difficulty in following what is to be 
-done in Contrasting Groups; ^ 

(3) the Boarder Line method does create some confusion with teachers, 

i.e., "What do you mean just barely minimally competent?" 

• , ». . • ■ 

(A) , the standards that result tend to be lower than those yielded 
by Angoff or Ebel; 

(5) the standard becomes available typically well after actual testing; ^ 

(6) the public is both confused and tends to doubt the legitimacy 

of the standard \fhen they (p f t en) . cannot understand the "statisti- 

cal magic" which delivers the standard; and 

(7) the miithods give support to the often stated contention that 
/ "teachers can already tell us who is competent." 

— Judgmental Method: Nedelsky • , 

(1) Judges find this method very confusing and they report not being 
confident in their judgments; 

(2) it can only be used by experienced teachers; 

(3) in implementing the method judges tend not to be careful In their 
study of items, often marking the correct choice as bei.ng not a 
viable distr actor; and . 

V 

(4) it delivers a standard substantially below that^f all other 
methods. In this context data from it is qui^Jfiiy ignored. 



\ 



3 



P 



— Judgmental Method: Angoff 

(1) easy to implement and understand either in a panel format or In 

« 

a survey format; 

(2) Judges tend to establish their own "mean" level, causing covislder- 
able variability among the individual Judge standards. This be- 
comes particularly problematic in the pjanel approach when so few 
judges are used; ^ ^ ^••--^^ 

(3) having to define students "who are minimally competent'* is a 
problem for many Judges; 

—Judgmental Method: Ebel 

(1) the task itself is time consuming. Fatigue and boredom can 
become a problem; 

(2) the mettiod is rather easy for most Judges to understand and <i&n 

be implemented without difficulty; 

(3) the relevance rating position of "Questionable" causes Judges to 
become concerned about the method; 

(4) the cell percent passing task causes real difficulty/debate over 
the "Questionable" dimension; and 

(5) when computing the standard, it can vary considerably depending 
whvether it is computed by Judge, or based on the group cell values* 

In practice, we no longer use the Nedelsky or empirical methods for 
reasons given above. We rely on gathering standards data on both the 
Ebel and Angoff methods. Also we have abandoned the use of panels. We 
have found the survey approach: (1) to be far more efficlenc relative 
to time and cost, (2) the survey approach permits a broader base for in- 
' ut to the decision-making process, and (3) standards across the two 
approaches are comparable, and psychometrically favor the survey approach. 



Yet, once the data are obtained the actual setting of each test's 
Standard Is not solved by the mathematics prescribed by the method. In 
fact, it Is iijterpolated for the data gathered (see Table I) by a 26- 
member State Advising Committee. ^ 

The process, while objective to a point, remains largely value 
ladened. ' « ^ - 



Ref erences 

Fogglo, J. P., Glasnapp, D. R. and Eros, D. S. An empirical Investlga- 
^ . tlon of the Angoff , Ebel and Nedelsky standard setting methods 
Papers presented at- the annual meeting of the American Educa- 
tional Research Association, Los Angeles, 1981. 

Poggio, J. P. ,• Glasnapp, D* R. and Eros, D. S. An evaluation of con- ' 
trasting groups methods for setting test standards. Paper 
presented at the annual meeting of the American Educational 
Research Association, New York, 1982. 

Poggio, J. JP, , Glasnapp, D. R. and Eros i D.- S . ^^^ A^ analysis of the 

vaJ.ldlty of judgmental methods used to set test standards. ^ 
f&per prevented at the annual meeting of the American 
Educational Research Association^ Montreal, 1983. 



r 



e 



9 



Table I 

STANDARD DATA - KANSAS MCT -1982 



Grade 


0* 

Area 


Method 


N 


^40. • 


riiean 


Median 


PgQ Test 


d uanuaru UdcQ 


2 


Read, y 


Angoff 
Ebel 


41 
41 


'34.50 
34.00 


35.93 
34.43 


37.63 
34.21 


38.20 
34.65 


35 


2 


Math. 


Angoff 
Ebel 


47 
41 


32.00.. 
33:40 


33.45 
34.25 


35.00 
"34.05 


36.50 
35.20 


34 


4 


Read* 


Angoff 
Ebel 


38 ^ 
39 


40.30 
44.41 


41.84 
45.17 


45.83 
45.31 


46.50 
46.15" 


45' 


4 


"Math. 


Angoff 
Ebel 


41 
43 


43.50 
41.27 


45.29 
41.89 


46.33 
41.86 


4§.75 
42.79 


43 . 


g 


Read. 


Angoff 
-Ebel 


43 
37 


41.80 
45.27 


43.95 
"46.01 


45.00 

-'45.41.' 


46.00 
" 46.42 


45 - - -— 


6 

w 


Math. 


Angoff 
Ebel 


38 
39 


42.84 
45.75 


44.10 
46.13 


. 46.75 . 
46.13 


47.20 
46.65 


45 ' 


8 


Read. 


Angoff 
Ebel 


38" 
4U 


40.50 
43.21 


41.21 
43.16 


44.83 
43.72 


46. 4d 
43.97 • 


43 


8 


Math. 


Angoff 
Ebel 


40 " 
37 ° 


40.10 
41.15 


42:40 
41.85 


. 42.50 
41.80 


44.09 
42.26 


42 


11 


Read. 


Angoff 
Ebel 


38 
38 


44.25 
46.70 


45.66 
47.27 


46.90 
47.65 


47.80 . 
47.75 


46 



Angoff 38 38.50 39.58 .41.50 43.50 
^ Ebel 33 41.20 42,44 42.12 42.55 



11 Math. *1 



SAMPLE SURVEY FORM USED W TTH THE ANGQFF METHOD ^ 
OZRECTZOHS FOR STANDARD SSmUG 

0 

Attached you vUl flad a copy of th« Kansas Mlalaim Coopatancy 
Taat la Raadlag for Srada U. Tout ttaslc is 1:0 raad aa itam, then, aatimata 
tha probabUltt^ chat tha mlalaaUy coopataat atudaat la Grada II la Raadiag 
would aaswar cha itaa corractly. Tau ara.to assign probabiiitiaa on a scala 
from 0 to IQtyv whara. 0 would masa that tha nlalaally coopatant studant has 
no^,chaaca o£ aaiwatlag tha'ltam cortactly, and 100 would maan that tha 
alalaally coapataat child is cartala to aaswar tha item corractly. Your 
aatlaata o£ tha probability that aa itam will be aaswarad corractly is co 
ba racordad oa Cha saparata rasponsa shaat. ^ - 

For azaopla* eonsidar tha itaa: ^ 

(Saapla) Which of tha following is tha oppositejif happy^_ ^ 

A'. ~ glad 

C. juiBp 

0. fliad 

If you ballava that a aiaiaally comp»tant child has a 96.parcaat chanc* 
,of aaswarlng chia itaa corraptly. Chen you wojj^d writa 96 on tha rasponsa^ 
shaat. tou are to iadicata tha probabUity of, a correct rasponsa by'^'the 
miniaally competent child to each item on the attached response sheet. 

When nalcing your judgements about each item, use the following 
guidelines: 



1. Use your own definition of a miaiaaily competent student - 
at the grade level for the test you are reviewing. It is 

• best to think of the skill level of a miaiaaily competent 
group of students rather than a single individual. 

2. Do not review the performance of your students on the test 
prioT'to naklag your Judgements. Rather, let your expert 
opinion and experience dictate the likelihood, of a correct 
response by a aiaimaily competent student. 



7 



"US 



Uatt this ahe«t eo rata: GRADE 11 READING 



RESPONSE SHEET 



DIRECTIONS: Raad a eeat lean, than Judga th« probability that the m i nimal ly 
•eoapatant atudaat In thia grada laval for this eontaat area 

<v would anawar^tha itaa correctly ' Probabilitiaa can ba aaaignad 

fron 0 (no chanea) to 100 (abaolutaly eartain).- Writa tha 
probability on tha IjAa aast to aach itam nuaba?:. , 



Probability of 
Itam Corraet Anawar 



2 
3 
4 
5 

7 

a 

9 

^'^ 
11 

12 

13.. 

14 

IS 

16 

17 

Id 

19 

20 



Probability o£ 
Itam Corraet Anavar 



21 

22 

23 

24 

25 

26 

27- 

28 

•a 

29 
30 
31 
32 
33 

> 

34 
35 
36 
37 
38 
39 
40 



Probability o£ 
Itam Corraet Anavar 



^41 

43 
44 
45 
46 
47 
48 
49 
SO 
51 
'S2 
S3 
S4 
55 
56 
57 
S3 
59 
60 



/ 



Thank you for your assistance with this activity Please tecum chese materials 
CO the person who gave them to you. 



ERIC 



- ' SAMPLE SURVEY FORM USED WITH THE EBEL METHOD 
ACnVITI I: DiaSCnOKS FOR STANDABD SETT^^IG 

.* *' * * . 

? . . 

Attachad you will lind a copy of th« Kanaaii MlaiiW CompAteacy tftst 
In MaehMAtlcs for Gcad« 3 . Tour ta«Ic is to raad an ifiam, than maka Ctfo 
saparata judgawiits about tha Itra* ena for difficulty and ona for ralevaace. 
Olractlona for toaklBg tha judgaaancs follow. 

RATIMG FOU DimCTLTT . Af tar raadlng an ittm. judge how difficult tha itaa 

la for Grada 8 studants. Each item is to ba ratad as elthtr: 

EASY (E) ' 
MEDIUM (M) , 
^r HABD .(B). 

For axaaplat conaidar tha itaa: * ' • , 

(^jaapla) Which nuobar is graatar than 39? 

A. 95 

B. 30 . ^ 
. C. 50 

D. 10 

If you baliava this itaa is easy for Grada 8 students you would circle, 
the E by this itea cn^ the response sheet. "Difficulty ratings are to be aade 
for each itea and recorded on the response sheet before beginning the next 

i 

rating task. 

ttATTSG FOR RELE7ANCE . Each itea tests a sicill in oatheoatics. Ne:?t. after 
reading an itea. Judge how relevant the iten is as a measure of, a miaioum 
competency sicill in Grade 8. The item may be rated as either: 

ESSENTIAL (E) 

IMPORTANT (1) 

ACCEPTABLE (A) 

or QUESTIONABLE (Q) 

For example, if you believe that the sample item is essential as a 
minimal competency for Grade 8 students in mathematics, you would circle E 
on the response sheet. Relevance ratings are Co be completed tor ail 
items and recorded on the appropriate response sheet before beginning the 
final activity. g 



ERIC 



•a 



'rja: this sheee eo raea: GRADE 3 MATHEMATICS 

.* ^ 

ACTI7ITT 1: ITEM DJOTICULXY-RESPONSE SHEET 

OXSECnONS: Rtad Mch tMC itea* th«n ju4g« how difficult Che item is for . 

studnts ae 6rad« 8 . Rata aac& lean ;otf^ difficulty by -circling: 

E for aaay m 
M for nadlun 
H f or hard 



Itaa 


OlffleultT 


Itam 


Difficulty 


Itan 


.^'Ifficulty 


I 


E M H 


21 


E M a 


41 


• 

E M a 


' 2 


E U B 


•22 


s M a 


42 


E M a 


3 


E M H 


23 


E M a 


43 


E M a 


4 


E M H 


24 


EM a 


44 


^- ' E M H 


5 


E H 


25 


E M a 


45 


E M a 


6 


B M H 


26 


s M a 


46 


E M a 


7 


E M H 


27 


E M a 


47 


E M a 


8 


E H a 


28 


E M a 


48 


E M a 


9 


E M'H 


29 


E M E 


49 


£ M a 


10 


E M a 


30 


E M a ' 


50 


E M a 


U 


E M a 


31 


E M a 


51 


E M a 


12 


E M a 


1 32 


E M a 


52 


E M-a 


13 


• E M a 


33 


E M a. 


53 


E M a 


U 


E M a 


34 


E M a 


54 


E M a 


15 


E M a 


35 


EM a 


55 


E M a 


Id 


■ ^ E M a 




E M a 


56 


E M a 


17 


E M a 




M a 


57 


E M a 


18 


£ M a 


38 


E M a 


58 


E M H 


* 19 


E M a 


39 


E M a 


59 


E M H 


20 


s M a 


40 


s M a 


60 


E M H 






1 









'0 



tl8« this sbMC to ratt; . GRADE 8 MATHEMATICS 



• Ajcrvrvn ir'-iTsii relevancs-sespcwsb sheet 

OZBECTZONS: RMd tacH tttt it«B» than judga bow relavaat the Item is as 
a HMaflura d£ a "^^^^w^t cenpattaey for studahts la Grada 3 • 
Sata oach iton on rolavanca b/ circling: 

E for eaaaatlal 









X iOT iSBDOYtAttt 

A £o'r accftotablft 
0 for quafltlooablt 


0 


• 


:tca 


> 

ReleTanca 


Ittcn 


Relevance 


Item 


Relevance 


1 


S t A Q 


21 


E I A q 


4i 


E A q 


2 


'E I A Q 


22 


E I A q 


42 


E I A q 




E I A Q 


23 


E I A q ■ 




E I A q 


4 


E l' A Q 


24 


. E r"A q • • 
E I A q 


44 


E I A q 


5 


E I A Q 


25 


'45- 


E I A q 


6 


E X A Q 


26 


E I A q 


46 ' 


E I A q 


7 


E I A Q 


27 


E I A q 


47 


E- i A q 


8 


5 I A Q 


28 


E I A q 


48 


E I A q 

-it 


9 


E I A Q 


29 


, E I A q • 


-49 

i 


E I A q 


10 


E I A Q 


30 


E I A q 


50 


E I A q 


U 


E I A Q ' 


31. 


E I A q . 


SI 


E I A q 


12 


E I A Q 


32 


E I A q. 


52 


E r A q 


13 


E I A Q 


33 


E I A q 


S3 


S I A Q 


14 


E I A Q 


34 


E I A q 


54 


E "i A q 


is 


E I A Q 


35 


E I A q 


5S 


£ I A q 


16 


E I A q 


36 


E I A q 


56 


" E I A-q 


17 


E I A q 


' ■ 37 


E I A q 


57 


E I A q 


18 


E I A q 


38 


E' I A q 


S8 


E X A q 


19 


E I A q 


39 


E I A q 


59 


E I A Q 


20 


- E I A q 


40 


E I A q 


60 


E I A Q 



ACTI7I7T 2: SEQTIXfiED PSB70BKANCS LEVELS 



This activity involvas making judgosacts about genaral catagorlff^ of 
Itams. Basad en difficulty I'avfl and ralavanca of itaos, 12 saparate 
catagoriaa of itna may ba found on a taat* 

I. Gonsidar a aat of 100 taat itams all of which hava baan judged 
ESS5MTIAL md HARD * Bow many of thaaa 100 items should a a.tudent 
ba able to answer correctly in order to ba Judged minimally 
competent? 

J items 



2. Consider a sat of 100 teat items all of which have been judged 
ESSgNTIAL and of HEDIDM DIFFICDLTY . How many of these 100 items 
should a student be able to answer correctly in order to be 
. judged minimally compatent? 



it 



3. Consider a sat of 100 test items all of .which hava been judged 
BSSESTIAL and SA§Y. How many of these 100 items should a student 
be able to answer correctly in order to ba judged minlmall:/ 
. competent? 



items 



4. Consider a sat of 100 teat items all of which have been judged 
IMPOHTAMT and HARD . How many of these 100 items should a student 
be able to anawer correctly' in order to ba judged minimally 
competent? 



items 



5. Consider a sat of 100 teet items all of which have been judged r 
PgORTANT and of MEDIUM DlgFICgLTY . How many of these 100 items 
should a student be able to answer correctly in order to be judged 
minimally competent? 



JLtems 



6. Consider a set of 100 test items all of which have been Judged 
IMPORTAyr and EASY . How many of these 100 items should a student 
be able to answer correctly in order to be Judged minimally 
competent? 

items 

..10 



7. Cea«ld«r a nmt of 100 tut iteos all of which have bcftn judged 

ACGggT^BTiB and HASP * Bow many of these 100 iteffls should a student 
be able to aaswer correctly iA order to be jxsdged minlfflally, 
competent? 

iteai 



8. Conaider ^a set of 100 cast Iteaa all of which have been judged 
^rrv!^reKBL& and of MEDIUM DIgFICT3LTY . Bdw^ many of these 100 item 
should a student be able to answer correctly in order to be jixdged 
misiiaally eoopetent? 

itena 



9. Consider a set of 100 test iteas all of which have been judged 

ACCTPTABLE and EASY . Bow aany of these 100 iteas should a student 
be able to answer correctly in order to be judged ainiaally 
coapetsntT ■ 

* i tsae 

ip. Consider .a set of 100 test iteas all of which have been judged 
of onESnOHABLE R2LE7AHCE and BABD . Bow many of these 100 iteas 
should a student be able to answer 'correctly in order to be judged 

* alnisally coosetent? . " 

■ / ]■ 

i t«BS 

11. Consider a set of 100 test iteas all of which have been judged 
of QUESTIONABLE RELEVANCE and of MEDITJM DI TnCPLTY . Bow many of 
these 100 itexos should a studvWt be able to answer correctly ia 
order to be fudged ainiaally <ioiDvetent? 

^iteas 



12. Consider a set of 100 test iteas all o£ which have been judged 
of QPESTIOHABLE RELEVANCE and EASY . Bbw aany of these 100 items 
should a student be able to answer correctly in order to be 
judged ainiaally coapetent? 

^iteas 

Thank you for your assistance with this activity, Please return these 
materials to the person who gave thea to you, 



13 



